Automatically Optimizing Stencil Computations on Many-Core NUMA Architectures

نویسندگان

  • Pei-Hung Lin
  • Qing Yi
  • Daniel J. Quinlan
  • Chunhua Liao
  • Yongqing Yan
چکیده

This paper presents a system for automatically supporting the optimization of stencil kernels on emerging Non-Uniform Memory Access(NUMA) many-core architectures, through a combined compiler + runtime approach. In particular, we use a pragma-driven compiler to recognize the special structures and optimization needs of stencil computations and thereby to automatically generate low-level code that efficiently utilize the data placement and management support of a C++ runtime on top of NUMA API, a programming interface to the NUMA policy supported by the Linux kernel. Our results show that through automated specialization of code generation, this approach provides a combined benefit of performance, portability, and productivity for developers.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Generalized Framework for Auto-tuning Stencil Computations

This work introduces a generalized framework for automatically tuning stencil computations to achieve superior performance on a broad range of multicore architectures. Stencil (nearest-neighbor) based kernels constitute the core of many important scientific applications involving block-structured grids. Auto-tuning systems search over optimization strategies to find the combination of tunable p...

متن کامل

Optimizing Stencil Computations: Multicore-optimized Wavefront Diamond Blocking on Shared and Distributed Memory Systems

Iterative Stencil Computations (ISC) appear in wide variety of scientific applications, partial differential equation (PDE) solvers being the most important one. In iterative stencil computations, each point in a multi-dimensional spatial grid is updated using weighted contributions from its neighbor points, defined by the stencil operator. The stencil operator specifies the relative coordinate...

متن کامل

Modeling Stencil Computations on Modern HPC Architectures

Stencil computations are widely used for solving Partial Differential Equations (PDEs) explicitly by Finite Difference schemes. The stencil solver alone -depending on the governing equationcan represent up to 90% of the overall elapsed time, of which moving data back and forth from memory to CPU is a major concern. Therefore, the development and analysis of source code modifications that can ef...

متن کامل

Auto-tuning the 27-point Stencil for Multicore

This study focuses on the key numerical technique of stencil computations, used in many different scientific disciplines, and illustrates how auto-tuning can be used to produce very efficient implementations across a diverse set of current multicore architectures.

متن کامل

GPU-UniCache: Automatic Code Generation of Spatial Blocking for Stencils on GPUs

Spatial blocking is a critical memory-access optimization to efficiently exploit the computing resources of parallel processors, such as many-core GPUs. By reusing cache-loaded data over multiple spatial iterations, spatial blocking can significantly lessen the pressure of accessing slow global memory. Stencil computations, for example, can exploit such data reuse via spatial blocking through t...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016